Siamese Network with Interactive Transformer for Video Object Segmentation

نویسندگان

چکیده

Semi-supervised video object segmentation (VOS) refers to segmenting the target in remaining frames given its annotation first frame, which has been actively studied recent years. The key challenge lies finding effective ways exploit spatio-temporal context of past help learn discriminative representation current frame. In this paper, we propose a novel Siamese network with specifically designed interactive transformer, called SITVOS, enable propagation from historical frames. Technically, use transformer encoder and decoder handle frame separately, i.e., encodes robust frames, while takes feature embedding as query retrieve output. To further enhance representation, interaction module (FIM) is devised promote information flow between decoder. Moreover, employ architecture extract backbone features both enables reuse more efficient than existing methods. Experimental results on three challenging benchmarks validate superiority SITVOS over state-of-the-art Code available at https://github.com/LANMNG/SITVOS.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Interactive Video Object Segmentation in the Wild

In this paper we present our system for human-in-theloop video object segmentation. The backbone of our system is a method for one-shot video object segmentation [3]. While fast, this method requires an accurate pixel-level segmentation of one (or several) frames as input. As manually annotating such a segmentation is impractical, we propose a deep interactive image segmentation method, that ca...

متن کامل

An interactive authoring system for video object segmentation and annotation

An interactive authoring system is proposed for semi-automatic video object (VO) segmentation and annotation. This system features a new contour interpolation algorithm, which enables the user to define the contour of a VO on multiple frames while the computer interpolates the missing contours of this object on every frame automatically. Typical active contour (snake) model is adapted and the c...

متن کامل

Efficient Video Object Segmentation via Network Modulation

Video object segmentation targets at segmenting a specific object throughout a video sequence, given only an annotated first frame. Recent deep learning based approaches find it effective by fine-tuning a general-purpose segmentation model on the annotated frame using hundreds of iterations of gradient descent. Despite the high accuracy these methods achieve, the fine-tuning process is ineffici...

متن کامل

Similarity Mapping with Enhanced Siamese Network for Multi-Object Tracking

Multi-object tracking has recently become an important area of computer vision, especially for Advanced Driver Assistance Systems (ADAS). Despite growing attention, achieving high performance tracking is still challenging, with state-of-theart systems resulting in high complexity with a large number of hyper parameters. In this paper, we focus on reducing overall system complexity and the numbe...

متن کامل

A Neural Network based Scheme for Unsupervised Video Object Segmentation

In this paper, we proposed a neural network based scheme for performing unsupervised video object segmentation, especially for videophone or videoconferencing applications. The procedure includes (a) a training algorithm for adapting the network weights to the current condition, (b) a Maximum A Posteriori (MAP) estimation procedure for optimally selecting the most representative data of the cur...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2022

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v36i2.20009